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Abstract 

^ I This paper derives a distributed Kalman filter to estimate a sparsely connected, large-scale, n— dimensional, dy- 

namical system monitored by a network of N sensors. Local Kalman filters are implemented on the {ni —dimensional, 

1 ^ I where ni <^ n) sub-systems that are obtained after spatially decomposing the large-scale system. The resulting sub- 

systems overlap, which along with an assimilation procedure on the local Kalman filters, preserve an Lth order 
^ Gauss-Markovian structure of the centralized error processes. The information loss due to the Lth order Gauss- 

Markovian approximation is controllable as it can be characterized by a divergence that decreases as L f. The order 
of the approximation, L, leads to a lower bound on the dimension of the sub-systems, hence, providing a criterion for 
sub-system selection. The assimilation procedure is carried out on the local error covariances with a distributed iterate 
collapse inversion (DICI) algorithm that we introduce. The DICI algorithm computes the (approximated) centralized 



00 

o 

Riccati and Lyapunov equations iteratively with only local communication and low-order computation. We fuse the 
observations that are common among the local Kalman filters using bipartite fusion graphs and consensus averaging 
^ ^ algorithms. The proposed algorithm achieves full distribution of the Kalman filter that is coherent with the centralized 

Kalman filter with an Lth order Gaussian-Markovian structure on the centralized error processes. Nowhere storage, 
^ communication, or computation of n— dimensional vectors and matrices is needed; only n; <C n dimensional vectors 

and matrices are communicated or used in the computation at the sensors. 

Index Terms 

Large-scale systems, sparse matrices, distributed algorithms, matrix inversion, Kalman filtering, distributed esti- 
mation, iterative methods 
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I. Introduction 

Centralized implementation of the Kalman filter [1], [2], although possibly optimal, does not provide robustness 
and scalability when it comes to complex large-scale dynamical systems with their measurements distributed on a 
large geographical region. The reasons are twofold: (i) the large-scale systems are very high-dimensional, and thus 
extensive computations are required to implement the centralized procedure; and (ii) the span of the geographical 
region, over which the large-scale system is deployed or the physical phenomenon is observed, poses a large 
communication burden and thus, among other problems, adds latency to the estimation mechanism. To remove 
the difficulties posed by the centralized procedure, we propose a distributed estimation algorithm with a low order 
Kalman filter at each sensor. To account for the processing, communication, and limited resources at the sensors, the 
local Kalman filters involve computations and communications with local quantities only, i.e., vectors and matrices 
of low dimensions, ni <§; n, where n is the dimension of the state vector — no sensor computes, communicates, or 
stores any n— dimensional quantity. 

Much of the existing research on distributed Kalman filters focuses on sensor networks monitoring low di- 
mensional systems, e.g., when multiple sensors mounted on a small number of robot platforms are used for 
target tracking [3], [4], [5]. This scenario addresses the problem of how to efficiently incorporate the distributed 
observations, which is also referred to in the literature as 'data fusion,' see also [6]. Data fusion for Kalman filters 
over arbitrary communication networks is discussed in [7], using consensus protocols in [8]. References [9], [4] 
incorporate packet losses, intermittent observations, and communication delays in the data fusion process. Although 
these solutions work well for low-dimensional dynamical systems, e.g., target tracking applications, they implement 
a replication of the n— dimensional Kalman filter at each sensor, hence, communicating and inverting nxn matrices 
locally, which, in general, is an 0{n^) operation. In contrast, in the problems we consider, the state dimension, n, 
is very large, for example, in the range of 10^ to 10*^, and so the nth order replication of the global dynamics in 
the local Kalman filters is either not practical or not possible. 

Kalman filters with reduced order models have been studied, in e.g., [10], [llj to address the computation burden 
posed by implementing nth order models. In these works, the reduced models are decoupled, which is sub-optimal 
as important coupling among the system variables is ignored. Furthermore, the network topology is either fully 
connected [10], or is close to fully connected [11 J, requiring long distance communication that is expensive. We 
are motivated by problems where the large-scale systems, although sparse, cannot be decoupled, and where, due 
to the sensor constraints, the communication and computation should both be local. Multi-level systems theory to 
derive a coordination algorithm among local Kalman filters is discussed in [12]. 

We present a distributed Kalman filter that addresses both the computation and communication challenges posed 
by complex large-scale dynamical systems, while preserving its coupled structure; in particular, nowhere in our 
distributed Kalman filter, do we require storage, communication, or computation of n— dimensional quantities. We 
briefly explain the key steps and approximations in our solution. 

Spatial Decomposition of Complex Large-Scale Systems: To distribute the Kalman filter, we provide a spatial 
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decomposition of the complex large-scale dynamical system (of dimension n) that we refer to as the overall system 
into several, possibly many, local coupled dynamical systems (of dimension n/, such that ni ^ n) that we refer to 
as sub-systems in the following. The large-scale systems, we consider, are sparse and localized. Physical systems 



with such characteristics are described in section II-A We exploit the underlying distributed and sparse structure 
of system dynamics that results from a spatio-temporal discretization of random fields. Dynamical systems with 
such structure are then spatially decomposed into sub-systems. The sub-systems overlap and thus the resulting local 
Kalman filters also overlap. This overlap along with an assimilation procedure on the local error covariances (of the 
local filters) preserve the structure of the centralized (approximated) error covariances. We preserve the coupling 
by applying the coupled state variables as inputs to the sub-systems. In contrast, an estimation scheme using local 
observers is addressed in [13], where the coupled states are also applied as inputs to the sub-systems, but, the 
error covariances are not assimilated and remain decoupled. Such scheme loses its coherence with the centralized 
procedure, as under arbitrary coupling, no structure of the centraUzed error covariance can be retained by the local 
filters. 

Overlapping Dynamics at the Sub-systems: Bipartite Fusion Graphs: The sub-systems that we extract from 
the overall system overlap. This makes several state variables to be observed at different sub-systems. To fuse this 
shared information, we implement a fusion algorithm using bipartite fusion graphs, which we introduced in [14], 
and local average consensus algorithms [15], [8]. The interactions required by the fusion procedure are constrained 
to a small neighborhood (this may require multi-hop communication in the small neighborhood). The multi-hop 
communication requirements, in our case, arise because the dynamical field is distributed among local sub-systems 
and only local processing is carried out throughout the development. In contrast, as noted earlier, a distributed 
Kalman filter with single hop communication scheme is presented in [7], but, we reemphasize the fact the this 
single-hop communications is a result of replicating nth order Kalman filters at each sensor 

Assimilation of the Local Error Covariances — Distributed Iterate -Collapse Inversion (DICI) Algorithm: 
A key issue in distributing the Kalman filter is to maintain a reasonable coherence with the centralized estimation; 
this is to say that the local error covariances should approximate the centralized error covariance in a well-defined 
manner. If the local error covariances evolve independently at each sub-system they may lose any coherence with the 
centralized error covariance under arbitrary coupling. To address this issue, we employ a cooperative assimilation 
procedure among the local error covariances that preserves a Gauss-Markovian structure in the centralized error 
processes. This is equivalent to approximating the inverse of the centralized error covariances (the information 
matrices) to be L— bandecQ as shown in [16]. The assimilation procedure is carried out with a distributed iterate- 
coUapse inversion (DICI, pronounced die-see) algorithm, briefly introduced in [17], that has an iterate step and a 
collapse step. 



'a measure on the loss of optimality is provided in section 
dimensions of the sub-systems. 



II-C 



and it will be shown that this measure serves as a criterion for choosing the 



^We refer to a matrix as an L-banded matrix (L > 0), if the elements outside the band defined by the Lth upper and Lth lower diagonal are 0. 



November 23, 2009 



DRAFT 



IEEE TRANSACTIONS ON SIGNAL PROCESSING 



4 



We use the Information filter, [18], [11], format of the Kalman filter where the information matrices (inverse of 
the error covariances) are iterated at each time step. We introduce the DICI algorithm that provides an iterative 
matrix inversion procedure to obtain the local error covariances from the local information matrices. The inverses of 
the local information matrices are assimilated among the sub-systems such that the local error covariances, obtained 
after the assimilation, preserve the Gauss-Markovian structure of the centralized error covariances. Iterative matrix 
inversion can also be carried out using the distributed Jacobi algorithm [19] where the computational complexity 
scales linearly with the dimension, n, of the overall system. In contrast, the computational complexity of the DICI 
algorithms is independent of n. In addition, the error process of the DICI algorithm is bounded above by the error 
process of the distributed Jacobi algorithm. We show the convergence of the iterate step of the DICI algorithm 
analytically and resort to numerical simulations to show the convergence of its collapse step. 

In summary, the spatial decomposition of the complex large-scale systems, fusion algorithms for fusing observa- 
tions, the DICI algorithm to assimilate the local error covariance combine to give a robust, scalable, and distributed 
implementation of the Kalman filter. 

We describe the rest of the paper. Section [ll| covers the discrete-time models, centralized Information filters, and 



centralized L— banded Information filters. Section III covers the model distribution step. We introduce the local 



Information filters in section III-B along with the necessary notation. Section IV gives the observation fusion step of 



the local Information filters, and section |V] presents the distributed iterate collapse inversion (DICI) algorithm. The 



filter step of the local Information filters is provided in section VI and the prediction step of the local Information 



filters is provided in section VII We conclude the paper with results in section VIII and conclusions in section IX 



Appendix [l] discusses the L— banded inversion theorem, [20]. 

II. Background 

In this section, we motivate the type of applications and large-scale dynamical systems of interest to us. The 
context is that of a time-varying random field governed by partial differential equations (PDEs); these systems can 
also be generalized to arbitrary dynamical systems belonging to a particular structural class, as we elaborate in 
Subsection |II-A[ To fix notation, we then present the centralized version of the Information filter. 

A. Global Model 

Many physical phenomenon [21], [22], [23], [24], [25], [26], e.g., ocean/wind circulation and heat/wave equations, 
can be broadly characterized by a PDE of the Navier-Stokes type. These are highly non-linear and different regimens 
arise from different assumptions. Non-linear approximation models are commonly used for prediction (predictive 
models). For data assimilation, i.e., combining models with measured data, e.g., satellite altimetry data in ocean 
models, it is unfeasible to use non-linear models; rather linearized approximations (dynamical Unearization) are 
employed. Here, our goal is to motivate how discrete linear models occur that exhibit a sparse and localized 



structure that we use to distribute the model in Section III Hence, we take a very simplistic example and consider 



November 23, 2009 



DRAFT 



IEEE TRANSACTIONS ON SIGNAL PROCESSING 



5 



elliptical operators, given by, 



52 



92 



(1) 



^dpl ' dpi' 

where px and py represent the horizontal and vertical dimensions, respectively, and a, /? are constants pertinent 
to the specific application. The spatial model for the continuous -time physical phenomenon (e.g., heat, wave, or, 
wind), xt, with a continuous-time random noise term, ut, can now be written as. 



Cxt = Ut, 



(2) 



which reduces to a Laplacian equation, Axt = (A = d^/dp^ + d^/dpy), or to a Poisson equation, —Axt = Ut, 
for appropriate choices of the constants. We spatially discretize the general model in (|2| on an M x J uniform 
mesh grid, where the standard 2nd order difference equation approximation to the 2nd order derivative is 



d^xt 
dpi 



d'^Xt 



(3) 



where is the value of the random field, xt, at the ij-th location in the M x J grid. With this approximation, 
the Laplace equation (say), leads to the individual equations of the form 

Xij - ^ {xt,j-i + Xij+i + Xi-ij + Xi+ij) = 0. (4) 

Let I be an identity matrix and let A be a tridiagonal matrix with zeros on the main diagonal and ones on the 
upper and lower diagonals, then Q, V i, j G AI x J (with appropriate constants for modeling the elliptical operator 
in Q), can be collected in the linear system of equations and (|2| can be spatially discretized in the form 



GcUt, 



(5) 



where the vector be collects the boundary conditions, the term GcUt controls where the random noise is added and 

T 



(6) 



and xl = [xti , 



,Xij]'^. The matrix Fc is given by 



B 



I® B ■ 



(7) 



C B 

where B = /il + /3/iA, and C = (3yl, the constants p,, f3h, (3v are in terms of a, (3 in ([TJ, and (E) is the Kronecker 
product [27]. 

If we further include time dependence, e.g., by discretizing the diffusion equation, we get 



Xt = FcXt 



GcUf, 



(8) 



where Xj = dxt/dt. We again employ the standard first difference approximation of the first derivative with 
being the sampling interval and k being the discrete-time index, and, write the above equation as 



xfc+i = (I + AtFc) xfc - Arhc - ArGcUfc, 



fc > 



(9) 
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which can be generahzed as the following discrete-time model (where for simplicity of the presentation and without 
loss of generality we drop the term be in the sequel) 



Xfe+l 



Gufc. 



(10) 

is the model 



In the above model, G M" is the state vector, xo G M" are the initial state conditions, F e 
matrix, £ W' is the state noise vector and G e M"^^' is the state noise matrix. 

Here, we note that the model matrix, F, is is perfectly banded in case of PDEs as in (|7|. We can relax this to 
sparse and localized, i.e., the coupling among the states decays with distance (in an appropriate measure). Besides 
random fields, the discrete-space-time model ( [TO] i with sparse and localized structure also occurs, e.g., in image 
processing, the dynamics at a pixel depends on neighboring pixel values [28], [29]; in power systems, the power 
grid models, under certain assumptions, exhibit banded structures, [30], [31], [32]. Systems that are sparse but not 
localized can be converted to sparse and localized by using matrix bandwidth reduction algorithms [33]. We further 
assume that the overall system is coupled, irreducible, and globally observable. 

Let the system described in ( fTO] ) be monitored by a network of N sensors. Observations at sensor I and time k are 



(0 



H;Xfc 



(11) 



where H; e M^'^" is the local observation matrix for sensor I, pi is the number of simultaneous observations made 
by sensor I at time k, and w^'-* e RP' is the local observation noise. In the context of the systems we are interested 
in, it is natural to assume that the observations are localized. These local observations at sensor I may be, e.g., the 
temperature or height at location I or an average of the temperatures or heights at / and neighboring locations. 

We stack the observations at all N sensors in the sensor network to get the global observation model as follows. 
Let p be the total number of observations at all the sensors. Let the global observation vector, e Rp, the global 
observation matrix, H e M^^", and the global observation noise vector, Wfe e M^, be 



Yk 



(N) 

Vk 



H 



Hi 



H 



N 



,(1) 



AN) 



(12) 



Then the global observation model is given by 

yfc = Hxfc + Wfe. (13) 

We adopt standard assumptions on the statistical characteristics of the noise. The state noise sequence, {ufc}fc>o, 
the observation noise sequence, {wfc}fc>o, and the initial conditions, xq, are independent, Gaussian, zero-mean, with 

E[u,uf ] = QS,,r and E[w,wf ] = R(5„, and E[xox^] = So, (14) 

where the superscript H denotes the Hermitian, the Kronecker delta 6,^ = 1, if and only if t = t, and zero 
otherwise. Since the observation noise at different sensors is independent, we can partition the global observation 
noise covariance matrix, R, with R; e M^'^P' being the local observation noise covariance matrix at sensor /, as 

R = blockdiag[Ri, . . . , Rat]. (15) 
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For the rest of the presentation, we consider time-invariant models, specifically, the matrices, F, G, H, Q, R, are 
time-invariant. The discussion, however, is not limited to either zero-mean initial condition or time-invariant models 
and generalizations to the time-variant models will be added as we proceed. 

B. Centralized Information Filter 

Let Sfci^j and Sfen,_i be (filtered and prediction) error covariances, and their inverses be the information matri- 
ces, Z/j.|fc and Zfc|fc_i. Let Sik\k and x^i^.i be the filtered estimate and the predicted estimate of the state vector, x^, 
respectively. We have the following relations. 

(16) 
(17) 
(18) 
(19) 

Define the n— dimensional global observation variables as 

ifc - H^R-Vfc, (20) 
I = H'^R-^H, (21) 
and the n— dimensional local observation variables at sensor I as 

U,k = HfR-Vi'\ (22) 
Ji = UfR-'Ui. (23) 

When the observations are distributed among the sensors, see ( [TT| ), the CIF can be implemented by collecting all 
the sensor observations at a central location; or with observation fusion by realizing that the global observation 
variables in (|20|— ([21), can be written as, see [3], [11], [7], 

H^RVfe 





~ ^k\k 


Sfc|i;-1 


- 7-1 
^ ^k\k-l 


Zfe|fc-1 


~ "^klk-l^klk-l 







Ife 



HfRrvr + ... + H^R^vi^ 



N 



Similarly, 



The filter step of the CIF is 



J2U,k- (24) 

1=1 

N 

X=^J,. (25) 



1=1 



N 



T^k\k = Zfe|fc_i+^Xz, fc>0 (26a) 

1=1 

N 

Zk\k = ^k\k-i + ^'u,k, fc > 0. (26b) 

1=1 
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The prediction step of the CIF is 

Zfc|fe-i = S-|i_^ = (FZ-^^|,_^F^+GQG^)-\ fc>l, Zo|-i-So\ (27a) 

Zfe|fe-i = ^k\k-i {FZj.^in,_iZfc_i|fc_iJ , fc > 1, Zo|_i = 0. (27b) 



C. Centralized L— Banded Information filters 

To avoid the 0{n^) computations of the global quantities in ( |27] i, e.g., the inversion, Z'^^^f,_^, we approximate 
the information matrices, Zk\k and Z^^k-i, to be L— banded matrices, Z^n, and Zfc|fc_i. We refer to the CIF with 
this approximation as the centralized i— banded Information filter (CLBIF). This approach is studied in [34], where 
the information loss between Z and Z, is given by the divergence (see [34]) 



Divergence(Z,Z) = ^ | jz"? (z - z) Z"^ 1 1^ < ^ (^^ A"!,^ (^y^ 



z z 



2 



(28) 



where || • ||f is the Frobenius norm and Ai(z) is the ith eigenvalue of the matrix Z. 

This approximation on the information matrices is equivalent to approximating the Gaussian error processes, 

efc|fe = x/j-Xfe|fe, efc|fc_i Xfc - Xfcife^i, (29) 

to be Gauss-Markovian of Lth order [16]. Reference [20] presents an algorithm to derive the approximation that is 
optimal in Kullback-Leibler or maximum entropy sense in the class of all L— banded matrices approximating the 
inverse of the error covariance matrix. In the sequel, we assume this optimal L— banded approximation. 

The CLBIF (with the banded information matrices, Z^-n- and Zm._i) is given by the filter step in ( |26a| l— ( pFb] ) 
and the prediction step in p7a| i— ( |27b] i, where the optimal information matrices, Zfe^. and Z^i^.i, are replaced by 
their L— banded approximations. The algorithms in [20], [35] reduce the computational complexity of the CLBIF 
to 0{n^) but the resulting algorithm is still centralized and deals with the dimensional state. To distribute the 
CLBIF, we start by distributing the global model ([TO|i— (pj) in the following section. 

III. Spatial Decomposition of Complex Large-Scale Systems 
Instead of implementing CLBIF based on the global model, we implement local Information filters at the sub- 



systems obtained by spatially decomposing the overall system. Subsection III-A deals with this decomposition by 
exploiting the sparse and localized structure of the model matrix, F. In the following, we devise a sub-system at 
each sensor monitoring the system. This can be applied to general model matrices, F, but is only practical if F is 
sparse and localized as we explain below. 

A. Reduced Models at Each Sensor 

This subsection shows how to distribute the global model ( [TO] l and ([T3|, in order to get the reduced order sub- 
systems. We illustrate the procedure with a simple example that reflects our assumptions on the dynamical system 
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Structure. Consider a five dimensional system with the global dynamical model 



Xfc+l 



hi 
/21 
/31 





/12 
I22 












/33 
/43 






hi 




hi 







hb 
























.932 












551 






Ufc 



(30) 



The system has two external noise sources ~ [wifc, U2kY' ■ We monitor this system with iV = 3 sensors, having 
scalar observations, yf^ , at each sensor I. The global observation vector, yt, stacks the local observations, yf^, and is 



(31) 









1 


1 


1 





















1 


1 


1 





X/c + 



















1 


1 







Wfe, 



where H — [Hf Hg We distribute global model of equations (|30]l and ( (3T] i in the following Subsections. 

1 ) Graphical Representation using System Digraphs: A system digraph visualizes the dynamical interdependence 
of the system. A system digraph, [13], = [V, £'], is a directed graphical representation of the system, where V — 
X U [/ is the vertex set consisting of the states, X — {xi}i=i....^„, and the noise inputs, U = {ui}i=i,...,j. The 
interconnection matrix, E, is the binary representation (having a 1 for each non-zero entry) of the model matrix, F, 
and the state noise matrix, G, concatenated together. The interconnection matrix, E, for the system in (|30]l is, 

1 1 
110 10 
E^ 1 1 1 . (32) 
10 10 
1 1 1 
The system digraph is shown in Figure |l(a)| 

2) Reduced Models from the Cut-point Sets: We have = 3 sensors monitoring the system through the 
observation model ( [3T| l. We associate to each sensor I a cut-point set, where V*^'' C AT. If a sensor observes 
a linear combination of several states, its cut-point set, includes all these states; see [36] for the definition of 
cut-point sets, and algorithms to find all cut-point sets and a minimal cut-point set, if it exists. The cut-point sets 
select the local states involved in the local dynamics at each sensor From ( (3T] i, the cut-point set^ are shown in 

'For simplicity of the pre.sentation, we chose here that each state variable is observed by at least one sensor. If this is not true, we can easily 
account for this by extending the cut-point sets, V^^\ to V^^\ such that 



T7(0 



X. 



(33) 
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(a) (b) 

Fig. 1. System Digraph and cut-point sets: (a) Digraph representation of the 5 dimensional system, j30|— jsTJ. The circles represent the 
states, X, and the squares represent the input noise sources, u. (b) The cut-point sets associated to the 3 sensors (A) are shown by the dashed 
circles. 





r 


p(i) 




D") 






/// fl2 
fl! fll 

hi 





fn 




f24 















"0 "0" 



f43 




u 

f54 


/« 






V 




-■^-■■-y 

pi) 



Fig. 2. Partitioning of the global model matrix, F, into local model matrices, FW, and the local internal input matrices, D^, shown for 
sensor 1 and sensor 3, from the example system, j30) — (TT) . 



Figure ?? where we have the following cut-point set, e.g., at sensor 1, 

= {a:i,a:2,X3}. (34) 

Dimension of the sub-systems: The local states at sensor /, i.e., the components of the local state vector, x^'\ 
are the elements in its associated cut-point set, The dimension of the local Kalman filter implemented at sensor 
I is now ni. The set of ri; -dimensional local Kalman filters will give rise, as will be clear later, to an L-banded 
centralized Information matrix with L = min(ni, . . . , um)- The loss in the optimality as a function of L is given by 
the divergence (|28]l. Hence, for a desired level of performance, i.e., for a fixed L, we may need to extend (include 



additional states in a cut-point set) the cut-point sets, V^''\ to such that 



L 

>L, V /, (35) 



where | • | when applied to a set denotes its cardinality. This procedure of choosing an L based on a certain desired 
performance gives a lower bound on the dimensions of the sub-systems. 
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Coupled States as Inputs: The directed edges coming into a cut-point set are the inputs required by the reduced 
model. In the context of our running illustration (|30ll— ((3T]l, we see that the local state vector for sensor 1 is, x^.^"* = 
[xi.k,X2,k,X3^k]'^ , and the inputs to the local model consist of a subset of the state set, X, (at sensor si, X4^k is 
the input coming from sensor S2) and a subset of the noise input set, U, (1*2, fc at sensor si). 

Local models: For the local model at sensor /, we collect the states required as input in a local internal input 
vector, d^,''' (we use the word internal to distinguish from the externally applied inputs), and the noise sources 
required as input in a local noise input vector, u^'''. We collect the elements from F corresponding to the local 
state vector, x^''', in a local model matrix, F^''. Similarly, we collect the elements from F corresponding to the 
local internal input vector, d^,''', in a local internal input matrix, D^'^ and the elements from G corresponding to 
the local noise input vector, u^'\ in a local state noise matrix, G'''. Figure [2] shows this partitioning for sensors si 
and S3. We have the following local models for (l30|. 



'■fc+i 



hi 
/21 
/31 



/12 

/22 
/33 















/24 


XA,k + 












ff32 



U2,k, 



F(i)x«+D«dl 



G(i)u« 



(36) 



,(2) _ 



/22 
/33 
/43 



/42 







/21 







/31 










fib 



F(2)xf +D(2)df +G(2)uf' 



X5.k 





532 





U2,k, 



(37) 



,(3) 






fib 




fi3 


X3M + 





hi 


hb 









.951 



r (3)^(^3) 



(3)h(3) 



j(3)„(3) 



Ul.k, 



(38) 



We may also capture the above extraction of the local states by the cut-point sets, with the following procedure. 
Let the total number of states in the cut-point set at sensor I, V^^K be ni. Let be an ni x n selection matrix, 
such that it selects ni states in the cut-point set, V^''\ from the entire state vector, Xfe, according to the following 
relation. 



TiXfe. 



(39) 



For sensor 1, the selection matrix, Ti, is 



Ti 



1 
10 
10 



(40) 
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Initialize 
VI-A 




Fig. 3. Block Diagram for the LIFs: Steps involved in the LIF implementation. The ovals represent the steps that require local communication. 

We establish a reduced local observation matrix, H*^'\ by retaining the terms corresponding to the local state 
vector, x^'\ from the local observation matrix, H/. We may write 

H(')=H;Tf, (41) 

where denotes the pseudo-inverse of the matrix. In the context of the running illustration, the reduced local 
observation matrix H*^^) ~ [1, 1, 1] is obtained from the local observation matrix Hi — [1, 1, 1, 0, 0]. Note that H; 
picks the states from the global state vector, x^, whereas H^'^ picks the states from the local state vector, xj,'''. The 
reduced local observation models are given by 

yi')=H(M)+wi'). (42) 

We now make some additional comments. For simplicity of the explanation, we refer to our running exam- 
ple, ([so])— ((3T]i. We note that the reduced models at the sensors overlap, as shown by the overlapping cut-point 
sets in Figure ??. Due to this overlap, observations corresponding to the shared states are available at multiple 
sensors that should be fused. We further note that the reduced model (|36]l at sensor 1 is coupled to the reduced 
model ( (37] l at sensor 2 through the state X4,k- The state X4^k at sensor 1 does not appear in the local state vector, 
i.e., X4^k ^ But it is still required as an internal input at sensor 1 to preserve the global dynamics. Hence, 

(2) 

sensor 2 communicates the state 0:4 fc, which appears in its local state vector, i.e., x^^k G , to sensor 1. Hence 
at an arbitrary sensor /, we derive the reduced model to be 

xi'l, = F«xi')+D«4') + G(')uf. (43) 

Since the value of the state itself is unknown, sensor 2 communicates its estimate, a;^ j.^,, to sensor 1. This allows 
sensor 1 to complete its local model and preserve global dynamics, thus, taking into account the coupling between 
the local reduced-order models. This process is repeated at all sensors. Hence, the local internal input vector, d^'', 
is replaced by its estimate, d['|'j,. It is worth mentioning here that if the dynamics were time-dependent, i.e., the 
matrices, F, G, H, change with time, k, then the above decomposition procedure will have to be repeated at each k. 
This may result into a different communication topology over which the sensors communicate at each k. 
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1^23 Z«l Zj^i 



I Z34 ! 



r(l) 



Z55 



= z 



Fig. 4. Relationship between the global error covariance matrices, S and their inverses, the global information matrices, Z, with L = 1— banded 
approximation on Z. The figure also shows how the local matrices, and Z''', constitute their global counterparts. Since this relation holds 
for both the estimation and prediction matrices, the figure removes the subscripts. 



B. Local Information Filters 

To distribute the estimation of the global state vector, x^., we implement local Information filters (LIFs) at each 
sensor I, which are based on the sensor-based reduced models ( |43| l and ( |42| i. Each LIF computes local objects 
(matrices and vectors of dimension ni,) which are then fused (if required) by exchanging information among the 
neighbors. Some of the update procedures are iterative. Although, no centralized knowledge of the estimation of the 
global state exists, the union of the local state vector represents, in a distributed way, the knowledge that exists in a 
centralized fashion in the CLBIF. In most applications nowhere in the network is there the need for this centralized 
knowledge. 

The LIFs consist of initial conditions, a local filter step (including observation fusion and distributed matrix 
inversion) and a local prediction step (including estimate fusion), see Figure [3] These steps are presented in the 
next four sections. To proceed with the next sections, we provide notation in the following Subsection. 

1) Notation: We use the notation that the superscript (l) refers to a local reduced-order variable (n; x 1 vector 
or ni X ni matrix) at sensor I to define the reduced observation vector, and the reduced observation matrix, J'^^\ as 

ii') = (HW)^R-Vi'\ (44) 
I« ^ (H('))^RriH('). (45) 

The local error covariance matrices, S^^, and S[,'|'^. j^, are the overlapping diagonal submatrices of the global 
error covariance matrices, S^ij. and Sfe|j._i. Let Z^''^ and Z^'|''^_j^ be the local information matrices. These local 
information matrices are overlapping diagonal submatrices of the global L— banded information matrices, Z^ifc 
and Zf.^i^_i. These local matrices overlap because the reduced sensor-based models ( |43] l have overlapping state 
vectors, x[,'\ Figure |4]captures the relationship between the local error covariance matrices and the local information 
matrices given by ( [T6] l and ( [T7] l. 

IV. Overlapping Reduced Models 



After the model distribution step introduced in section III the reduced models among the sensors may share states. 



as shown by the overlapped cut-point sets in Figure ??. Since the sensors sharing the states have independent 
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Si S2 Si 



Fig. 5. Bipartite Fusion grapli, B, is shown for the example system, |30j— ijsTJ. 



observations of the shared states, observations corresponding to the shared states should be fused. We present 



observation fusion in subsection IV-A with the help of bipartite fusion graphs, [14]. 



A. Observation Fusion 

Equations ( p4j l and p5| ) show that the observation fusion is equivalent to adding the corresponding n— dimensional 
local observation variables, (p2ji~(|23]l. In CLBIF, we implement this fusion directly because each local observation 
variable in (|22ll— (|23|l corresponds to the full n— dimensional state vector, x^:. Since the 7i;— dimensional reduced 
observation variables, (|44l)— (|45]l, correspond to different local state vectors, x^'\ they cannot be added directly. 

To achieve observation fusion, we introduce the following undirected bipartite fusion graplj^ B. Let Sn = 
{si, . . . jSat} be the set of sensors and X be the set of states. The vertex set of the bipartite fusion graph, B, 
is Sn[JX. We now define the edge set, Es, of the fusion graph, B. The sensor Si is connected to the state 
variable Xj, if Si observes the state variable Xj. In other words, we have an edge between sensor Si and state 
variable Xj, if the local observation matrix, H^, at sensor s^, contains a non-zero entry in its jth column. Figure [5] 
shows the bipartite graph for the example system in (pO])— (|3T|. 

We now further provide some notation on the communication topology. Let Q denote the sensor communication 
graph that determines how the sensors communicate among themselves. Let JC{1) be the subgraph of Q, that contains 
the sensor I and the sensors directly connected to the sensor I. For the jth state, Xj, let Qj be the induced subgraph 
of the sensor communication graph, Q, such that it contains the sensors having the state Xj in their reduced models. 

The set of vertices in Qj come directly form the bipartite fusion graph, B. For example, from Figure |5] we 
see that Qi contains si as a single vertex, Q2 contains si,S2 as vertices, and so on. States having more than one 
sensor connected to them in the bipartite fusion graph, B, are the states for which fusion is required, since we have 
multiple observations for that state. Furthermore, Figure |5] also gives the vertices in the associated subgraphs, Qj, 
over which the fusion is to be carried out. 

With the help of the above discussion, we establish the fusion of the reduced observation variables, (|44ll— (|45]l. 
The reduced model at each sensor involves n; state variables, and each element in the n; x 1 reduced observation 
vector, i^''', corresponds to one of these states, i.e., each entry in if' has some information about its corresponding 

'*A bipartite graph is a graph whose vertices can be divided into two disjoint sets X and 5]v, such that, every edge connects a vertex in X 
to a vertex in Sjv, and, there is no edge between any two vertices of the same set, [37]. 
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State variable. Let the entries of the n; x 1 reduced observation vector, i^,'-*, at sensor I, be subscripted by the ni 
state variables modeled at sensor 1. In the context of the example given by system (|30ll— ([3T|, we have 



''k.xi 

k,X2 



;(2) 



.(2) 
^(2) 
■(2) 



;(3) 



.(3) 



(46) 



For each state Xj, the observation fusion is carried out on the sensors attached to this state in the bipartite fusion 
graph, B. The fused observation vectors denoted by ii'^^, are given by 



j(i) 



k,X2 
k,X3 



.(2) 

k,X2 

^(2) 

k,X3 



j(2) 



.(2) 



'•k,X2 



k^xs k,X3 



.(2) 

k.X4 



Generalizing to the arbitrary sensor /, we may write the entry, ifj^ ^.y corresponding to xj in the fused observation 



^(3) 

(0 



j(3) 



€1 +€l 

'■k,X5 



(47) 



vector, i^pf, as 



■(0 

^f.k^Xj 



k.Xi ' 



(48) 



(s) (s) 

where ^ is the entry corresponding to Xj in the reduced observation vector at sensor s, i^. ' . 

Since the communication network on Qj may not be all-to-all, an iterative weighted averaging algorithm [15], is 
used to compute the fusion in (|48j over arbitrarily connected communication networks with only local communica- 
tion (the communication may be multi-hop). A similar procedure on the pairs of state variables and their associated 
subgraphs can be implemented to fuse the reduced observation matrices, I*^''. Since, we assume the observation 
model to be stationary (H and R are time-independent), the fusion on the reduced observation matrix, is to 
be carried out only once and can be an offline procedure. If that is not the case and H and R are time dependent 
fusion on T has to be repeated at each time, k. 

Here, we assume that the communication is fast enough so that the consensus algorithm can converge, see [38] 
for a discussion on distributed Kalman filtering based on consensus strategies. The convergence of the consensus 
algorithm is shown to be geometric and the the convergence rate can be increased by optimizing the weight matrix 
for the consensus iterations using semidefinite programming (SDP) [39]. The communication topology of the sensor 
network can also be improved to increase the convergence speed of the consensus algorithms [40], [41], [42]. 

Here, we comment on estimate fusion. Since we fused the observations concerning the shared states among the 
sensors, one may ask if it is required to carry out fusion of the estimates of the shared states. It turns out that 
consensus on the observations leads to consensus on the estimates. This will become clear with the introduction of 
the local filter and the local prediction step of the LIFs, therefore, we defer the discussion on estimate fusion to 
section IVII-CI 



V. Distributed Matrix Inversion with Local Communication 
In this section, we discuss the fusion of the error covariances. Consider the example model (|30l)— (|3T]i, when we 



employ LIFs on the distributed models ([36|— (|38]l. The local estimation information matrices, Z^!],, Z^^^, and Z^'^' 
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correspond to the overlapping diagonal submatrices of the global 5x5 estimation information matrix, 2^^,, see 
Figure |4] with L = 1— banded assumption on Zk\k- It will be shown (section VII-Ai that the local prediction 



information matrix, z['^j^|j., is a function of the local error covariance matrices, s[,'|''^, and hence we need to 
compute Sj^^j, from the local filter information matrices, Z^^i^, which we get from the local filter step (section 



VI I. 



As can be seen from Figure |4] and ( fTS] ), for these local submatrices, 

S(0 ^ (^z(')) ^ . (49) 

Collecting all the local information matrices, Z^'-*^, at each sensor and then carrying out an nxn matrix inversion is 
not a practical solution for large-scale systems (where n may be large), because of the large communication overhead 
and 0{n^) computational cost. Using the L— banded structure on the global estimation information matrix, Zk\k, we 
present below a distributed iterate collapse inversion overrelaxation (DICI-OR, pronounced die-see-O-R) algorithrrj^ 
For notational convenience, we disregard the time indices in the following development. 

We present a generalization of the centralized Jacobi overrelaxation (JOR) algorithm to solve matrix inversion 



in section |V-A| and show that the computations required in its distributed implementation scale linearly with the 
dimension, n, of the system. We then present the DICI-OR algorithm and show that it is independent of the 
dimension, n, of the system. 

A. Centralized Jacobi Overrelaxation (JOR) Algorithm 

The centralized JOR algorithm for vectors [19] solves a linear system of n equations iteratively, by successive 
substitution. It can be easily extended to get the centraUzed JOR algorithm for matrices that solves 

ZS = T, (50) 

for the unknown matrix, S, where the matrices Z and T are known. Let M = diag(Z), for some 7 > 

St+i = ((1 - 7) Inx„ + (M - Z)) St + 7M-IT, (51) 

converges to S, and is the centralized JOR algorithm for matrices, solving n coupled linear systems of equations (|50|, 
where 7 is sometimes called a relaxation parameter [19]. Putting T = Inxn, we can solve for ZS = I„xn =^ 
S = Z^^, and, if Z is known, the following iterations converge to Z^^, 

St+i = P^St + 7M-i, (52) 

where the multiplier matrix, P^, is defined as 

P^ = (l-7)I„x„+7M"i(M-Z). (53) 

It is worth mentioning here that the DICI algorithm (for solving ZS = Inxn, with SPD L-banded matrix Z e M"^" and the n X n 
identity matrix l„xn) is neither a direct extension nor a generalization of (block) Jacobi or Gauss-Seidel type iterative algorithms (that solve 
a vector version, Zs = b with s,b e M", of ZS = I„xn, see [19], [43], [44], [45], [46].) Using the Jacobi or Gauss-Seidel type iterative 
schemes for solving ZS = I is equivalent to solving n linear systems of equations, Zs = b; hence, the complexity scales linearly with n. 
Instead the DICI algorithm employs a non-linear collapse operator that exploits the structure of the inverse, S, of a symmetric positive detinite 
-L— banded matrix, Z, which makes its complexity independent of n. 
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The Jacobi algorithm can now be considered as a special case of the JOR algorithm with 7=1. 

1) Convergence: Let S be the stationary point of the iterations in ( |52j i, and let Sf+i denote the iterations of ( [52) l. 
It can be shown that the error process, Et+i, for the JOR algorithm is 

E^+i - P^Et, (54) 

which decays to zero if IIP7II2 < 1, where || • II2 denotes the spectral norm of a matrix. The JOR algorithm 
( |52| i converges for all symmetric positive definite matrices, Z, for sufficiently small 7 > 0, see [19], an alternate 
convergence proof is provided in [47] via convex i\f -matrices, whereas, convergence for parallel asynchronous 
team algorithms is provided in [48]. Since, the information matrix, Z, is the inverse of an error covariance matrix; 
the information matrix, Z, is symmetric positive definite by definition and the JOR algorithm always converges. 
Plugging 7 = 1 in (|52]) gives us the centrahzed Jacobi algorithm for matrices, which converges for all diagonally 
dominant matrices, Z, see [19]. We can further write the error process as, 

Et+i = P;+i(So-S), (55) 

where the matrix So is the initial condition. The spectral norm of the error process can be bounded by 

||E«+i||2 < |A^ax(P7)r+Ml(So-S)||2, (56) 

where A,nax is the maximum eigenvalue (in magnitude) of the multiplier matrix, P-^. The JOR algorithm is 
centralized as it requires the complete n x n matrices involved. This requires global communication and an nth 
order computation at each iteration of the algorithm. We present below its distributed implementation. 

2) Distributed JOR algorithm: We are interested in the local error covariances that lie on the L— band of the 
matrix S = Z^^. Distributing the JOR (in addition to [19], distributed Jacobi and Gauss Seidel type iterative 
algorithms can also be found in [43], [45], [47]) algorithm ( |52j i directly to compute the L— band of S gives us the 
following equations for the ij-th element, Sij, in St+i, 

Sij,t+i = Pi^l, iy^j, (57) 

Sij,t+i = Ptsl+m~j^^, i^j, (58) 

where; the row vector, p^, is the ith row of the multiplier matrix, P-,,; the column vector, s^, is the jth column of the 
matrix, S^; and the scalar element, ma, is the ith diagonal element of the diagonal matrix, M. Since the matrix Z is 
i— banded, the multiplier matrix, P^, in (|53j is also L— banded. The ith row of the multiplier matrix, P-^, contains 
non-zeros at most at 2L + 1 locations in the index set, k — {i — L, . . . ,i, . . . ,i + L}. These non-zero elements pick 
the corresponding elements with indices in the index set, k, in the jth column, s^, of Sf. Due to the L— handedness 
of the multiplier matrix, P-y, the JOR algorithm can be easily distributed with appropriate communication with the 
neighboring sensors. 

A major drawback of the distributed JOR algorithm is that at each sensor the computation requirements scale 
linearly with the dimension, n, of the system. This can be seen by writing out the iteration in ( |57] l, e.g., for an 
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L— banded element such that \i — j\ = L. In the context of Figure |4] we can write 345,4+1 from ( |57] i as 

S45,t+1 = P43S35,t +P44S45,t +P45S55,t (59) 

The element 335 4 does not lie in the L— band of S, and, hence, does not belong to any local error covariance 
matrix, S'^'-'. Iterating on it using ( [57] ) gives 

S35,t+1 = P32S25,t + P33S35,t + P34S45,t- (60) 

The computation in ( |60| ), involves S25.t, iterating on which, in turn, requires another off _L— band element, sis^t, 
and so on. Hence, a single iteration of the algorithm, although distributed and requiring only local communication, 
sweeps the entire rows in S and the computation requirements scale linearly with n. We now present a solution to 
this problem. 



B. Distributed Iterate Collapse Inversion Overrelaxation (DICI-OR) Algorithm 

In this section, we present the distribute iterate collapse inversion overrelaxation (DICI-OR) algorithm. The DICI- 
OR algorithm is divided into two steps: (i) an iterate step and (ii) a collapse step. The iterate step can be written, 
in general, for the ij-th element that lies in the L— band (\i — j\ < L) of the matrix ^t+i as 

Su,t+i = P»St, i^j, \i-j\<L, (61) 

Si]d+i = p.,s^+TO,^\ i^j,\i-i\<L, (62) 



where the symbols are defined as in section V-A.2 



As we explained before in section V-A.2 the implementation of (|6T|-(|62]l requires non L— banded elements 
that, in turn, require more non L— banded elements. To address this problem we introduce a collapse step. We 
assume that is the inverse of an i— banded matrix and use the result^ in [35], to compute a non L— band 
element {sij such that \i — i\ > L) from the L— band elements (sij such that — < L). In general, a non i— band 
element in a matrix whose inverse is L = 1— banded can be written as 

Sij" = \'i-j\>L (63) 

which gives us the collapse step. In the context of Figure |4j instead of iterating on S35 as in ( [60) 1, we employ the 
collapse step, 

S35,i = S34,tS44';tS45,t, (64) 

that prevents us from iterating further on the non L— banded elements. 

*If S is the inverse of an L— banded matrix, tlien tlie submatrices that do not lie in the L— band of S, can be computed from the submatrices 
that lie in the L— band of S, [35]. So, to compute the inverse, S = Z~^, of an L— banded matrix, Z, we just compute the submatrices that lie 
in the L— band of its inverse, S; the remaining submatrices are derived from these using the expressions given in [35]. 
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The initial conditions of the DICI-OR algorithm are given by 

Pm = (l-7)li'L + 7(M(")"'(M(')-Z(')), (65) 

S^'^ = (ZC))"' (66) 



Note that equations (|65])-(|66| do not require any communication and can be computed at each sensor directly from 
the local information matrix, Z*^''. This is because the matrix M is diagonal, and its local submatrix, M^'^ in ( |65] l, 
is the exact inverse of the matrix formed by the diagonal elements of Z^^'^ 

We refer to equations (|6T]l— (|63|l, combined with (|65|l-(|66|), and appropriate communication from neighboring 
sensors as the DICI-OR algorithm. The DICI-OR algorithm can be easily extended to L > 1. The only step to 
take care of is the collapse step, since ( |63] l holds only for L = 1. The appropriate formulae to replace ( |63] l, 
when L > 1, are provided in [35]. The computation requirements for the DICI algorithm are independent of n 
and DICI provides a scalable implementation of the matrix inversion problem. The DICI algorithm (without the 
overrelaxation parameter, 7) can be obtained from the DICI-OR algorithm by setting 7 = 1. 

1) Convergence of the DICI-OR algorithm: The iterate and the collapse step of the DICI algorithm over the 
entire sensor network can be combined in matrix form as follows. 

Iterate Step: Sf+i = P^S* + \i - i\ < L (67) 

Collapse Step: St+i = C(St+i), \i - j\ > L (68) 

The operator (^(•) is the collapse operator; it takes an arbitrary symmetric positive definite matrix and converts it to 
a symmetric positive definite matrix whose inverse is banded by using the results in [35]. The DICI algorithm 
is a composition of the linear iterate operator, P^ given in (|53|, followed by the collapse operator, C(-) given in 
(|63]l for L = 1 and in [35] for L > \. Combining ( |67| i and ( |68] l summarizes the DICI algorithm as, 

S = c(P^(s + (P^M)-i)) . (69) 

We define a composition map, T : H H, where H C M"^" is the set of all symmetric positive definite matrices, 
as T = Co P..^. To prove the convergence of the DICI-OR algorithm, we are required to show that the composition 
map, T, is a contraction map under some norm, that we choose to be the spectral norm || • II2, i.e., for a e [0, 1), 

||T(X3)-T(Yh)||2<«||Xh-Yh||2, VXh.YhGH (70) 

The convergence of the iterate step of the DICI-OR algorithm is based on the iterate operator, P^, which is proved 
to be a contraction map in [19]. For the convergence of the collapse operator, C,, we resort to a numerical procedure 
and show, in the following, that (|70]l is a contraction by simulating (|70]l 1.17 x 10^ times. 

For the simulations, we generate n x n matrices, Xj.^jjj, with i.i.d. normally distributed elements and get, 
Xsym = Xj.^jjjj + X^^^. We eigen-decompose Xsym = VAV^. We replace A with a diagonal matrix. Ah, 
whose diagonal elements are drawn from a uniform distribution in the interval (0,10]. This leads to a random 
symmetric positive definite matrix, 

Xh = VAhV^. 
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Fig. 6. Histogram of a: Simulations of the quotient of are performed 1.17 x 10^ times and the resuhs are provided as a histogram. 

For n = 100 and L a random integer between 1 and n/2 = 50, we compute, by Monte Carlo simulations, the 
quotient of < [70| ) 

||T(Xs)-T(Ys)||2 
||Xh-Yh||2 ■ 

The number of trials is 1.17 x 10^. The histogram of the values of a, in ( |7T| ), (with 1000 bins) is plotted in Figure |6] 
The maximum value of a found in these 1.17 x 10^ simulations is 0.9955 and the minimum value is 0.1938. Since 
a £ (0, 1), i.e., strictly less than 1, we assume that ( |70j i is numerically verified. 

2) Error Bound for the DICI-OR algorithm: Let the matrix produced by the DICI-OR algorithm at the t+ 1-th 
iteration be St+i- The error process in the DICI-OR algorithm is given by 

Et+i = %+i - S. (72) 

Claim: The spectral norm of the error process, ||Et-|_i||2, of the DICI-OR algorithm is bounded above by the 
spectral norm of the error process, ||E(+i||2, of the JOR algorithm. Since, the JOR algorithm always converges for 
symmetric positive definite matrices, Z, we deduce that the DICI-OR algorithm converges. 

We verify this claim numerically by Monte Carlo simulations. The number of trials is 4490, and we compute 
the error process, Ef+i(A'), of the DICI-OR algorithm and the error process, 'E,t+i{K), of the JOR algorithm. We 
choose the relaxation parameter, 7, to be 0.1. In Figure ??, Figure ??, and Figure ??, we show the following, 

max (||E,+i(i^)||2-||E,+i(i^)||2), 

mm (||Et+i(/C)||2-||Et+i(i^)||2), 

mean^ (||Et+i(i^)||2 ~ ||Et+i(i^)||2) , 

respectively, against the number of iterations of the JOR and DICI-OR algorithm. Since all the three figures show 
that the max, min, and the mean of the difference of the spectral norm of the two error processes, ||Et+i(iir)||2 — 
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(a) (b) (c) 

Fig. 7. Simulation for the error bound of the DICI-OR algorithm. 



||Et+i(i4r)||2, is always > 0, we deduce that (using equation pof ) 

||Et+l||2 < ||Et+i||2, 

< |A„,ax(P7)r+' ll(So-S)||2. (73) 

This verifies our claim numerically and provides us an upper bound on the spectral norm of the error process of 
the DICI algorithm. 

VI. Local Information Filters: Initial Conditions and Local Filter Step 
The initial conditions and the local filter step of the LIFs are presented in the next subsections. 

A. Initial Conditions 

The initial condition on the local predictor is 

- 0- (74) 

Since the local information matrix and the local error covariances are not the inverse of each other, ( |49] l, we obtain 
the initial condition on the prediction information matrix by using the L— banded inversion theorem [20], provided 
in appendix [l] This step may require a local communication step further elaborated in section VII-A 



Zg|'_-|^ L— Banded Inversion Theorem Sq (75) 

B. Local Filter Step 

In this section, we present the local filter step of the LIFs. The local filter step is given by 

- (76b) 
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IV- 



where x'j^ and denote the fused observation variables. Fusion of the observations is presented in section 
[A| The distribution of the addition operation, '+', in p6l ) is straightforward in (|76|. Recall that the observation 
fusion, ( |48l ), is carried out using the iterative weighted averaging algorithm. The asymptotic convergence of this 
iterative algorithm is guaranteed under certain conditions, e.g., connected sensor communication sub-graph, Qj, 
see [15], for further details. Hence, with the required assumptions on the sub-graph, Qj, the observation fusion 
algorithm, ( |48] l, asymptotically converges, and hence (with a slight abuse of notation), 

N N 

IJiJi^ife and IJxy^^J. (77) 

1=1 1=1 

The above notation implies that the local fused information variables, 2^ ^ and i^'^^, when combined over the entire 
sensor network, asymptotically converge to the global information variables, T and ij.. This, in turn, implies that 
the local filter step of the LIFs asymptotically converges to the global filter step, (|26]l, of the CLBIF. 



Once the local filter step is completed, the DICI algorithm is employed on the local information matrices, zj^^^, 
obtained from ( |76a| i, to convert them into the local error covariance matrices, S^'i'^,. Finally, to convert the estimates 
in the information domain, z^'j'j,, to the estimates in the Kaknan filter domain, i^j^, we specialize the DICI algorithm 
to matrix- vector product ([T9]l. 

VII. Local Information Filters: Local Prediction Step 
This section presents the distribution of the global prediction step, (|27]l, into the local prediction step at each 



LIF. This section requires the results of the DICI algorithm for the L— banded matrices, introduced in section [V] 

A. Computing the local prediction information matrix, Z^'|^^,_j^ 

Because of the coupled local dynamics of the reduced sensor-based models, each sensor may require that some 
of the estimated states be communicated as internal inputs, d^'^^,, to its LIF, as shown in ( |43) l. These states are the 
directed edges into each cut-point set in Figure ??. Hence, the error associated to a local estimation procedure is 
also influenced by the error associated to the neighboring estimation procedure, from where the internal inputs are 
being communicated. This dependence is true for all sensors and is reflected in the local prediction error covariance 
matrix, S^^^. ^^, as it is a function of the global estimation error covariance matrix, Sfe_i|fe_i. Equation ( |78| l follows 
from ( |27a| i after expanding p7a| i for each diagonal submatrix, S^'|''^ in Sk\k- 

Si'),_^ - F,Sfc_i|fc_iFf + G(')qWg(')^. (78) 

The matrix, F; = T;F, (the matrix T/ is introduced in ([39|) is an x n matrix, which relates the state vector, x^, 
to the local state vector, x^'\ Figure [2] shows that the matrix, F;, is further divided into F^'^ and D^'^. With this 
sub-division of F;, the first term on the right hand side of ( |78] l, F;Sj._i|fc_iF^, can be expanded, and ( [78] ) can be 
written as 

sSLi - F(')s«,„_,F(0- + F(Osf;t^i,D(0- 

+ (fWs^!:'411iDW^)'^ + D«sf iliDW^ + gWQ«G»^, (79) 
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where 

^k~i\k~i '■^^ local error covariance matrix, which is available from ( |76a| l and the DICI algorithm at sensor I; 

local error covariance matrix, which is available from ( |76a[ l and the DICI algorithm at the sensors 
having the states, d^.'-*, in their reduced models; 

S^'^-i\k-i error cross correlation between the local state vector, x^.'"*, and the local internal input vector, d^'''. 



The non L— banded entries in this matrix can be computed from the equation ( |64| ), see [35]. Since the model 
matrix, F, is sparse, we do not need the entire error covariance matrix, Sj._in._i, only certain of its submatrices. 
Since the model matrix, F, is localized, long-distance communication is not required, and the submatrices are 
available at the neighboring sensors. 

Once we have calculated the local prediction error covariance matrix, S^^j, j^, we realize ( |49] l and compute the 
local prediction information matrix, Z^'|^j, j^, using the L— banded Inversion Theorem (see [20] and appendix |ljl. 

Z^.'i'^, L— Banded Inversion Theorem sj^yj,_j^. (80) 

From ( |86l ) in appendix |lj to calculate the local prediction information matrix, Z^'i'^ j^, we only need the S^'|'^, 
from sensor T and from some additional neighboring sensors. Hence Z^^^ -i^ is again computed with only local 
communication and n;th order computation. 



B. Computing the local predictor, 



We illustrate the computation of the local predictor, z~,|^ j^, for the S—dimensional system, (|30ll— ((3T]i, with L = 1. 
The local predictor, z^lj^ j^, at sensor 3 follows from the global predictor, ( |27b[ ), and is given by 



^(3) 
^k\k- 



(f -(1) , f -(2) A 

^34 (^/Sia^i fc_i|fc_i + /33a;3 ^._i|j,_i j 





(81) 



where Z34 is the only term arising due to the L = 1— banded (tridiagonal) assumption on the prediction information 
matrix, 7im._i. Note that /3i2;i^^._i|;,_i + /33a;3^^._i|;j_i is a result of f3Xj,_i|j._i, where £3 is the third row of the 
model matrix, F. A model matrix with a localized and sparse structure ensures that f3Xfc_in,_i is computed from 
a small subset of the estimated state vector, communicated by a subset Q C K,[l) of the neighboring 

sensors, which are modeling these states in their reduced models. This may require multi-hop communication. 
Generalizing, the local predictor in the information domain, z^'i'^, j^, is given by 



^(0 



zSLi(F<'^ei,.-i + D(')dl?l,,_,)+/i(zg2_i,F(v),x£^^^^ (82) 
for some V, Q C JC{1), where /i(-) is a linear function and depends on L. 



C. Estimate Fusion 

We present the following fact. 

Fact: Let m denote the iterations of the consensus algorithm that is employed to fuse the observations. As 
m 00, the local estimates, z[,'|''j,, in ( |76b| i also reach a consensus on the estimates of the shared states. 
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It is straightforward to note that as m cx3 we have a consensus on the estimates (of the shared states) in the 
local filter step ( |76b[ ), if we have a consensus on the local predictors (of the shared states), Zk^^-i- To show the 
consensus on the local predictors, we refer back to our illustration and write the local predictors for sensor 2 as 
follows, 

212 (/ll^l^Ll|fe-l + /l24^Ll|fc-l) 

. (83) 

f n ^(2) . r ^(3) \ 

The predictor for the shared state X4^k can now be extracted from ( (ST) and ([83| and can be verified to be the 
following for I = 2,3. 

The elements z^j belong to the prediction information matrix, which is computed using the DICI algorithm and the 
banded inversion theorem. It is noteworthy that the DICI algorithm is not a consensus algorithm and thus the 
elements Zij are the same across the sensor network at any iteration of the DICI algorithm. With a consensus on 
the local predictors, the iterations of the consensus algorithm on the observations lead to a consensus on the shared 
estimates. 

VIII. Results 

A. Summary of the LIFs 

We summarize the distributed local Information filters. The initial conditions are given by ^TA\ and ( |75] l. 

Observation fusion is carried out using ( |48] l. The fused observation variables, 1^?^^, and 1^'^, are then employed 

HI) 



in the local filter step, ( |76a[ ) and ( |76b| i, to obtain the local information matrix and the local estimator, Z^|^ and 

't\k 



z^9j,, respectively. We then implement the DICI algorithm (|6r|— (|62|) and ( |63] l to compute the local error covariance 



matrix, S^^''^,, from the local information matrix, Z^'|'^,. The DICI algorithm is again employed to compute the local 
estimates in the Kalman filter domain, x^,'"'^,, from the local estimator, z^'''^,, as a special case. Finally the local 
prediction step is completed by computing the local prediction error covariance matrix, the local prediction 

information matrix, Z^'^'^. ^^, and, the local predictor, z^'''^ j^, from ^f9\ , (|80]l, and ( [82) l, respectively. 

B. Simulations 

We simulate a. n = 100— dimensional system with = 10 sensors monitoring the system. Figures ?? and ?? 
show the non-zero elements (chosen at random) of the model matrix, F, such that ||F||2 = 1. The model matrix 
in Figure ?? is L = 20— banded. The model matrix in Figure ?? is L = 36— banded that is obtained by employing 
the reverse Cuthill-Mckee algorithm [33] for bandwidth reduction of a sparse random F. The non-zeros (chosen 
at random as Normal(0, 1)) of the global observation matrix, H, are shown in Figure |9] The kh row of the global 
observation matrix, H, is the local observation matrix, H;, at sensor I. Distributed Kalman filters are implemented 
on (i) F in ?? and H in Figure |9j and (ii) F in ?? and H in Figure |9] The trace of the error covariance matrix. 
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Fig. 8. (a & b) Non-zero elements (chosen at random) of 100 X 100. L = 20— banded (Figure ??) and L = 36— banded (Figure ??) model 
matrices, F, such that ||F||2 = 1. (c & d) Distributed Kalman filter is implemented on the model matrices in Figure ??-?? and the global 
observation matrix, H (Figure [9}, in Figure ??-??. The expectation operator in the trace (on horizontal axis) is simulated over 1000 Monte 
Carlo trials. 
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Fig. 9. Global observation matrix, H. The non-zero elements (chosen at random) are shown. There are A'^ = 10 sensors, where the Ith row 
of H corresponds to the local observation matrix, H;, at sensor I. The overlapping states (for which fusion is required) can be seen as the 
overlapping portion of the rows. 
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Fig. 10. Performance of the DICI algorithm as a function of the number of DlCl iterations, t. 



Sfc|fc, is simulated for different values of L in [n, 1, 2, 5, 10, 15, 20] and the plots are shown (after averaging over 
1000 Monte Carlo trials) in Figure ?? for case (i); and in Figure ?? for case (ii). The stopping criteria for the DICI 
algorithm and the consensus algorithm are such that the deviation in their last 10 iterations is less than 10^^. In 
both Figure ?? and Figure ??, tT{Sk\k) represents the trace of the solution of the Riccati equation in the CIF (no 
approximation). 

With 1000 Monte Carlo trials, we further simulate the trace of the error covariance, tr{Sk\k)^ for case (ii) and 
L — 20— banded approximations as a function of the number of iterations, t, of the DICI-OR algorithm. We 
compare this with (a) the simulation obtained from the 0{n'^) direct inverse of the the error covariance (with 
L — 20— banded approximation on its inverse); and (b) tr(Sfc|fc), trace of the solution of the Riccati equation of 
the CIF (no approximation). We choose t — [1,10,30,100,200] for the DICI algorithm and show the results in 



Figure 10 As 1 1, the curves we obtain from the DICI algorithm get closer to the curve we obtain with the direct 
inverse. 

The simulations confirm the following: (i) The LIFs asymptotically track the results of the CLBIF, see Figure [TOj 
(ii) We verify that as L f, the performance is virtually indistinguishable from that of the CIF, as pointed out 
in [35]; this is in agreement with the fact that the approximation is optimal in Kullback-Leibler sense, as shown 
in [20]. Here, we also point out that, as we increase L the performance increases, but, we pay a price in terms 
of the communication cost, as we may have to communicate in a larger neighborhood, (iii) If we increase the 
number of Monte Carlo trials the variations reduce, and the filters eventually follow the solution of the Riccati 



equation, tr(S;j|fc); (iv) The curve in Figure 10 with t = 1 shows the decoupled LIFs, when the global error 
covariance is treated as a block diagonal matrix. This is unstable as we do not fuse the error covariances and the 
individual sub-systems are not observable. We next discuss the computational advantage of the LIFs over some of 
the existing methods. 
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C. Complexity 

We regard the multiplication of two n x n matrices as an 0{n^) operation, inversion of an n x n matrix also as 
an 0{n^) operation and multiplication of an n x n matrix and an n x 1 vector as an 0{n^) operation. For all of 
the following, we assume N sensors monitoring the global system, ( [TO] l. 

1) Centralized Information Filter, CIF: This is the case where each sensor sends its local observation vector 
to a centralized location or a fusion center, where the global observation vector is then put together The fusion 
center then implements the CIF, with an 0{n^) computation complexity for each time fc, so we have the complexity 
as 0{ii?k), with inordinate communication requirements (back and forth communication between the sensors and 
the central location). 

2} Information Filters with Replicated Models at each Sensor and Observation Fusion: In this scheme, the local 
observation vectors are not communicated to the central location, but are fused over an all-to-all communication 
network, [3], or an arbitrary network, [7], that requires an iterative algorithm. Computational complexity at each 
sensor is 0{n?k) in [3]. In [7], let the complexity of the iterative algorithm be 0{if{n)) and let be the 
number of iterations required by the iterative algorithm to converge. Each sensor implements a CIF after fusing 
the observations. For each time index, fc, each sensor requires 0{n?) operations plus the operations required for 
the iterative algorithms, which are 0{ifi{n)t^); so the total computation complexity is 0{{n^ + ip{n)t^)k) at each 
sensor Communication requirements are global in [3], and local in [7]. 

3) Distributed Kalman Filters: Local Information Filters, LIFs: The distributed Kalman filter presented in this 
paper has three iterative algorithms. In all the other steps the computation complexity is dominated by 0{n^), 
where ni ^ n. Let to be the iterations required by the weighted averaging algorithm, where at each step of 
the iterative algorithm the computations are dominated by 0{n^). Let tj^ be the iterations required by the DICI 
algorithm for vectors, where at each step of the iterative algorithm the computations are dominated by an 0{L^) 
operations. Let tj^ be the iterations required by the DICI algorithm, where at each step of the iterative algorithm 



the computations are dominated by 0{L ) operations. Recalling that L ^ ni from section III the total computation 
complexity is 0{{nf + nfto + nftj-^ + nftj2)k). Let = max(to, ij^ , ij^), then the computation complexity is 
bounded by 0{nft^k) at each sensor for the LIFs, which is much smaller than the computational cost of the 



solutions in |Vni-C.l| and Vni-C.2 The communication requirement in the LIFs may be multi-hop but is always 



constrained in a neighborhood because of the structural assumptions on the model matrix, F. 

IX. Conclusions 

In conclusion, we presented a distributed implementation of the Kalman filter for sparse large-scale systems 
monitored by sensor networks. In our solution, the communication, computing, and storage is local and distributed 
across the sensor network, no single sensor processes ?i— dimensional vectors or matrices, where n, usually a 
large number, is the dimension of the state vector representing the random field. We achieve this by solving 
three linked problems: (1) Distributing the global state variable model representing the random field into sensor 
based reduced-order models. These reduced-order models are obtained using a graph-theoretic model distribution 
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technique that decomposes the random field model into coupled low-order models; (2) Fusing, through distributed 
averaging, multiple sensor observations of the state variables that are common across sub-systems; and (3) Inverting 
fiill n— dimensional matrices, with local communication only, by deriving an iterative distributed iterate collapse 
inversion (DICI) algorithm. The DICI algorithm only requires matrices and vectors of the order n; of the reduced 
state vectors. The DICI algorithm preserves the coupling among the local Kalman filters. Our solution contrasts 
with existing Kalman filter solutions for sensor networks that either replicate an n— dimensional Kalman filter at 
each sensor or reduce the model dimension at the expense of decoupling the field dynamics into lower-dimensional 
models. The former are infeasible for large-scale systems and the latter are not optimal and further cannot guarantee 
any structure of the centralized error covariances.. 

Simulations show that the distributed implementation with local Kalman filters implemented at each sensor 
converges to the global Kalman filter as the number of bands in the information matrices is increased. 



Appendix I 
L-Banded Inversion Theorem 

Let Z = S^^ be L— banded. We apply the algorithm, given in [20], to obtain Z from the submatrices in 
the L— band of S, as . We use the following notation, in ([85|, to partition matrix addition and subtraction in terms 
of its constituent submatrices. Also represents the principal submatrix of S spanning rows i through j, and 
columns i through j. 



ai 


02 









ai a2 




X + y -\ 


- z 04 











05 


06 







z a4 
0,5 as 



(85) 



The inverse of S, when Z = is L— banded, is given by ( (86) 1, taken from [20], in terms of the banded 
submatrices of S. Note that, to compute a principal submatrix in Z, we do not need the entire S, or even all 
the L— banded submatrices in S. Instead, we only require three neighboring submatrices in the L— band of S. For 
proofs and further details, the interested reader can refer to [20]. 
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